Consonant landmark detection for speech recognition

نویسنده

Chi-youn Park

چکیده

This thesis focuses on the detection of abrupt acoustic discontinuities in the speech signal, which constitute landmarks for consonant sounds. Because a large amount of phonetic information is concentrated near acoustic discontinuities, more focused speech analysis and recognition can be performed based on the landmarks. Three types of consonant landmarks are defined according to its characteristics—glottal vibration, turbulence noise, and sonorant consonant—so that the appropriate analysis method for each landmark point can be determined. A probabilistic knowledge-based algorithm is developed in three steps. First, landmark candidates are detected and their landmark types are classified based on changes in spectral amplitude. Next, a bigram model describing the physiologically-feasible sequences of consonant landmarks is proposed, so that the most likely landmark sequence among the candidates can be found. Finally, it has been observed that certain landmarks are ambiguous in certain sets of phonetic and prosodic contexts, while they can be reliably detected in other contexts. A method to represent the regions where the landmarks are reliably detected versus where they are ambiguous is presented. On TIMIT test set, 91% of all the consonant landmarks and 95% of obstruent landmarks are located as landmark candidates. The bigram-based process for determining the most likely landmark sequences yields 12% deletion and substitution rates and a 15% insertion rate. An alternative representation that distinguishes reliable and ambiguous regions can detect 92% of the landmarks and 40% of the landmarks are judged to be reliable. The deletion rate within reliable regions is as low as 5%. The resulting landmark sequences form a basis for a knowledge-based speech recognition system since the landmarks imply broad phonetic classes of the speech signal and indicate the points of focus for estimating detailed phonetic information. In addition, because the reliable regions generally correspond to lexical stresses and word boundaries, it is expected that the landmarks can guide the focus of attention not only at the phoneme-level, but at the phrase-level as well. Thesis Supervisor: Kenneth N. Stevens Title: Professor of Electrical Engineering and Computer Science

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Obstruent Consonant Landmark Detection in Thai Continuous Speech

The presence of obstruent consonants constitutes key landmark events with cues that indicate abrupt acoustic discontinuities in the speech signal. Such discontinuities allow further analysis and recognition to be performed in knowledge-based speech recognition systems. This paper describes an acoustical investigation on Thai obstruent consonant detection using average level crossing rate (ALCR)...

متن کامل

Application of Local Binary Patterns for SVM based Stop Consonant Detection

Detection of acoustic phonetic landmarks is useful for a variety of speech processing applications such as automatic speech recognition.The majority of existing methods use Melfrequency Cepstral Coefficients (MFCCs) describing the short time power spectral envelope of the speech signal. This paper hypothesizes that a different feature extraction method can be used to complement MFCCs by capturi...

متن کامل

Nasal detection module for a knowledge-based speech recognition system

The Lexical Access From Features (LAFF) project tries to model the representation and perception of speech by human listeners. The derivation of such a representation involves first finding certain acoustic landmarks. Based on the landmarks and the acoustic cues surrounding the landmarks, distinctive features of the speech segments may be deciphered. The present study concentrates on the nasali...

متن کامل

A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.

A probabilistic framework for a landmark-based approach to speech recognition is presented for obtaining multiple landmark sequences in continuous speech. The landmark detection module uses as input acoustic parameters (APs) that capture the acoustic correlates of some of the manner-based phonetic features. The landmarks include stop bursts, vowel onsets, syllabic peaks and dips, fricative onse...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Consonant landmark detection for speech recognition

نویسنده

چکیده

منابع مشابه

Obstruent Consonant Landmark Detection in Thai Continuous Speech

Application of Local Binary Patterns for SVM based Stop Consonant Detection

Nasal detection module for a knowledge-based speech recognition system

A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

عنوان ژورنال:

اشتراک گذاری